Learn R Programming

bigmemory (version 3.6)

biglm.big.matrix, bigglm.big.matrix: Use Thomas Lumley's ``biglm'' package with a ``big.matrix''

Description

This is a wrapper to Thomas Lumley's biglm package, allowing its use with data stored in big.matrix objects.

Usage

biglm.big.matrix(formula, data, fc=NULL, chunksize=NULL, weights=NULL, sandwich=FALSE)
bigglm.big.matrix(formula, data, family=gaussian(), fc=NULL, chunksize=NULL,
          weights=NULL, sandwich=FALSE, maxit=8, tolerance=1e-7, start=NULL)

Arguments

formula
a model formula.
data
fc
either column indices or names of variables that are factors.
chunksize
an integer maximum size of chunks of data to process iteratively.
weights
a one-sided, single term formula specifying weights (see biglm for more information).
sandwich
TRUE to compute the Huber/White sandwich covariance matrix (see biglm for more information).
family
a glm family object
maxit
maximum number of Fisher scoring iterations.
tolerance
tolerance for change in coefficient (as multiple of standard error).
start
optional starting values for coefficients. If NULL, maxit should be at least 2 as some quantities will not be computed on the first iteration.

Value

  • an object of class biglm.

Details

See biglm package for more information; chunksize defaults to floor(nrow(data)/ncol(data)^2).

References

Algorithm AS274 Applied Statistics (1992) Vol. 41, No.2

Thomas Lumley (2005). biglm: bounded memory linear and generalized linear models. R package version 0.4.

See Also

biglm, big.matrix

Examples

Run this code
# This example is quite silly, using the iris
# data.  But it shows that our wrapper to Lumley's biglm() function produces
# the same answer as the plain old lm() function.

x <- matrix(unlist(iris), ncol=5)
colnames(x) <- names(iris)
x <- as.big.matrix(x)
head(x)

silly.biglm <- biglm.big.matrix(Sepal.Length ~ Sepal.Width + Species, data=x, fc="Species")
summary(silly.biglm)

y <- data.frame(x[,])
y$Species <- as.factor(y$Species)
head(y)

silly.lm <- lm(Sepal.Length ~ Sepal.Width + Species, data=y)
summary(silly.lm)

Run the code above in your browser using DataLab